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rppn Cluster Generation and Maintenance 
Fife Format: Microsoft Powerpoint 97 - View as HTML 

Apply phylogenetic criterion to merge clusters. Take reciprocal best hits among 
... Singleton clusters, with status: N. Log of ambiguous cases. Back to high ... 

mendei.stanford.edu/ProPhyiER/ AuxiSiaryFiSes/ProPhyiERJ3ata J^iow.ppt - Sjmi)ar.gages. 

pdf] OCR with No Shape Training 

File Format: PDF/Adobe Acrobat - View as HTML 

merge some adjacent components like the dots on i's and ... The ratio is. 

retained for further use. Any singleton cluster is merged into the nearest larger ... 

cm. bell-labs. com/cm/cs/who/tkh/papers/noshape.pdf - Similar pages 

rpsi Fast and Intuitive Clustering of Web Documents \Lambda Oren Zamir ... 
File Format: Adobe PostScript - View as Text 

This component captures the notion that singleton clusters are "bad". ... We are 
currently exploring the option of merging potential clusters with high ... 
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File Format: Microsoft Powerpoint 97 - View as HTML 

... based on the cardinality of the index and the cluster ratio of the index. ... 

What is an example of a singleton select and a select requiring a cursor? ... 

www.oti.fsu.edu/dba/20Q3J3atabase_ Training/DB2_SQUModu!e6Tuning.ppt - Sjmiiar.pages 

Efficient clustering of large EST data sets on parallel computers 

(b) Four types of overlaps accepted as indication to merge clusters, ... 

Distribution of the number singleton and non-singleton clusters for benchmark set ... 

www.pubmedcentraLnih.gov/articlerender. fcgi?tooi-pubmed&pubmedid-12771 222 - Similar pages 

A functional hierarchical organization of the protein sequence space 
This is an Open Access article distributed under the terms of the Creative ... 
The larger the PL, the later the merging that created the cluster took place. ... 

wvw.pubmedcentral.nih.gov/ articierender.fcgi?artjd-54456S - SjmHar oa&es 

BioMed Central | Full text | Large scale hierarchical clustering ... 
Later on, sequences merging in correspond to weakly related proteins. ... 
Access to the cluster set. The SYSTERS cluster set [17] is available over the ... 

www. biomedcentrai. com/1 47 1 -21 05/6/1 5 - 1 06k ~ Cached - SjrrilM, pages 
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This is an Open Access article distributed under the terms of the Creative ... 
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positive outcome helps in merging of two clusters. As a result, ... singleton clusters. 
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1 Simj.jarityrdrjyen.cju 

Xuejian Xiong, Kap Luk Chan, Kian Lee Tan 

July 2004 Proceedings of the 20th conference on Uncertainty in artificial intelligence 
AUAI '04 

Full text available: ^pdft459.38 KB) Additional Information: full citation, abstract, references 

In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy 
clustering. The cluster merging method is used to resolve the problem of cluster validation. 
Starting with an overspecified number of clusters in the data, pairs of similar clusters are 
merged based on the proposed similarity-driven cluster merging criterion. The similarity 
between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive 
threshold is used for merging. In addition ... 



2 .Resiearch^ 
lliethodoiogy 

David Cheng, Santosh Vempala, Ravi Kannan, Grant Wang 

June 2005 Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium 
on Principles of database systems 

Full text available: ^ pdf(791.76 KB) Additional Information: full citation , abstract , references 

We present a divide-and-merge methodology for clustering a set of objects that combines a 
top-down "divide" phase with a bottom-up "merge" phase. In contrast, previous algorithms 
either use top-down or bottom-up methods to construct a hierarchical clustering or produce 
a flat clustering using local search (e.g., /c-means). Our divide phase produces a tree whose 
leaves are the elements of the set. For this phase, we use an efficient spectral algorithm. 
The merge phase quickly finds an optim ... 



3 Poster.^ 
merging. 

Cheng-Ru Lin, Ming-Syan Chen 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^pdf(§3§AQ.KBj Additional Information: fuJicjMion, abstract references, index terms 

Data clustering has attracted a lot of research attention in the field of computational 
statistics and data mining. In most related studies, the dissimilarity between two clusters is 
defined as the distance between their centroids, or the distance between two closest (or 
farthest) data points. However, all of these measurements are vulnerable to outliers, and 
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removing the outliers precisely is yet another difficult task. In view of this, we propose a 
new similarity measurement referred to as coh ... 

The merge/purge problem for large databases Q 
Mauricio A. Hernandez, Salvatore J. Stolfo 

May 1995 ACM SIGMOD Record , Proceedings of the 1995 ACM SIGMOD international 

conference on Management of data, volume 24 issue 2 
Full text available- *f £\ pdfM.37 MB) Addjtion a' Information: MLcMion, abstract, references, citings, index 
' M terms 

Many commercial organizations routinely gather large numbers of databases for various 
marketing and business analysis functions. The task is to correlate information from 
different databases by identifying distinct individuals that appear in a number of different 
databases typically in an inconsistent and often incorrect fashion. The problem we study 
here is the task of merging data from multiple sources in as efficient manner as possible, 
while maximizing the accuracy of the result. We call thi ... 

Aparaijel.algorjthm.fo Q 
Edward Omiecinski, Peter Scheuermann 

December 1990 ACM Transactions on Database Systems (TODS), volume is issue 4 

Full text available: 11) pdgl 82 MB ) Additional Information: full citation, abstract, references , citings, index 
™ v terms, review 

We present an efficient heuristic algorithm for record clustering that can run on a SIMD 
machine. We introduce the P-tree, and its associated numbering scheme, which in the split 
phase allows each processor independently to compute the unique cluster number of a 
record satisfying an arbitrary query. We show that by restricting ourselves in the merge 
phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio 
is optimal in the number of processors used. Final ... 

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 
conference on Management of data, volume 27 issue 2 

Additional Information: full citation, abstract, references, citings, index 



Full text available: ■ ppdf(1.71 MB) 

terms 

Clustering, in data mining, is useful for discovering groups and identifying interesting 
distributions in the underlying data. Traditional clustering algorithms either favor clusters 
with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We 
propose a new clustering algorithm called CURE that is more robust to outliers, and 
identifies clusters having non-spherical shapes and wide variances in size. CURE achieves 
this by representing each cluster by a certai ... 

Similarity querying li: QCIuster: relevance feedback using adaptive clustering for 
content-based image retrieval 
Deok-Hwan Kim, Chin-Wan Chung 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Full text available: fg bdf<2.15 M31 Additional Information: MLvjtation, abstract, references, citings, index 
• yy ' v terms 

The learning-enhanced relevance feedback has been one of the most active research areas 
in content-based image retrieval in recent years. However, few methods using the 
relevance feedback are currently available to process relatively complex queries on large 
image databases. In the case of complex image queries, the feature space and the distance 
function of the user's perception are usually different from those of the system. This 
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difference leads to the representation of a query with multiple ... 

Keywords: classification, cluster-merging, content-based image retrieval, image database, 
relevance feedback 



8 increment Q 
Moses Charikar, Chandra Chekuri, Torres Feder, Rajeev Motwani 

May 1997 Proceedings of the twenty-ninth annual ACM symposium on Theory of 
computing 

Full text available: i f|pdf{1.58 MB) Additional Information: full citation, references, citings, index terms 



Research sessions: clustering: Clustering objects on a spatial network 
Man Lung Yju, Nikos Mamoulis 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Full text available: |||pdf(867 ,67.KBj Additional Information: MLcitatjon, abstract references 

Clustering is one of the most important analysis tasks in spatial databases. We study the 
problem of clustering objects, which lie on edges of a large weighted spatial network. The 
distance between two objects is defined by their shortest path distance over the network. 
Past algorithms are based on the Euclidean distance and cannot be applied for this setting. 
We propose variants of partitioning, density-based, and hierarchical methods. Their 
effectiveness and efficiency is evaluated for collect ... 



10 A unified framework for model-based clustering 
Shi Zhong, Joydeep Ghosh 

December 2003 The Journal of Machine Learning Research, Volume 4 
Full text available: ^Mft.851.4§. KB) Additional Information: Mlvlatipn, aMract, index terms 

Model-based clustering techniques have been widely used and have shown promising 
results in many applications involving complex data. This paper presents a unified 
framework for probabilistic model-based clustering based on a bipartite graph view of data 
and models that highlights the commonalities and differences among existing model-based 
clustering algorithms. In this view, clusters are represented as probabilistic models in a 
model space that is conceptually separate from the data space. For ... 

11 Improved merging of datapath operators using information content and required 
precision analysis 
Anmol Mathur, Sanjeev Saluja 

June 2001 Proceedings of the 38th conference on Design automation 

Full text available: ■£ § pdff217.77 KB) Additional ,nformation: Mixtion, abstract .references, citings, index 

terms 

We introduce the notions of required precision and information content of datapath signals 
and use them to define functionally safe transformations on data ow graphs. These 
transformations reduce widths of datapath operators and enhance their mergeability. Using 
efficient algorithms to compute required precision and information content of signals, we 
define a new algorithm for partitioning a data flow graph consisting of datapath operators 
into mergeable clusters. Experimental results indie ... 

12 Web clustering: Evaluation of hierarchical clustering algorithms for document datasets Q 
Ying Zhao, George Karypis 

November 2002 Proceedings of the eleventh international conference on Information 
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and knowledge management 

Full text available: ®pdfM29J8 KB) Additional lnformation: MjGMtaO. abstract, references, crtinos, index 
^* ~ terms 

Fast and high-quality document clustering algorithms play an important role in providing 
intuitive navigation and browsing mechanisms by organizing large amounts of information 
into a small number of meaningful clusters. In particular, hierarchical clustering solutions 
provide a view of the data at different levels of granularity, making them ideal for people to 
visualize and interactively explore large document collections. In this paper we evaluate 
different partitional and agglomerative approa ... 

Keywords: agglomerative clustering, hierarchical clustering, partitional clustering 



13 Data mining (DM): GraphZip: a fast and automatic compression method for spatial data Q 
clustering 

Yu Qian, Kang Zhang 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^.pdK630J.2 KB) Additional Information: MlQitation, abstract, references, .citings 

Spatial data mining presents new challenges due to the large size and the high 
dimensionality of spatial data. A common approach to such challenges is to perform some 
form of compression on the initial databases and then process the compressed data. This 
paper presents a novel spatial data compression method, called GraphZip, to produce a 
compact representation of the original data set. GraphZip has two advantages: first, the 
spatial pattern of the original data set is preserved in the compresse ... 

Keywords: clustering, data compression, spatial databases 



14 Word ciuMerin disambiguation based on co-occurrence data 

Hang Li, Naoki Abe 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 2 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 2 

Full text available: Mpd£647,25. KB) 

Ji" Additional Information: full citation , abstract, references 

W Polisher Site 

We address the problem of clustering words (or constructing a thesaurus) based on co- 
occurrence data, and using the acquired word classes to improve the accuracy of syntactic 
disambiguation. We view this problem as that of estimating a joint probability distribution 
specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an 
efficient algorithm based on the Minimum Description Length (MDL) principle for estimating 
such a probability distribution. Our method is a natu ... 

15 Hierarchical face ciusterina on poiygonai surfaces 
Michael Garland, Andrew Willmott, Paul S. Heckbert 

March 2001 Proceedings of the 2001 symposium on Interactive 3D graphics 

Full text available: ft$l£LZ7J4B! Additional Information: Ml .citation, references, citings, jndexjerms 



Keywords: dual contraction, face clusters, quadric error metrics, spatial data structures, 
surface simplification 
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F. Bertails, f-Y. Kim, M-P. Cani, U. Neumann 

July 2003 Proceedings of the 2003 ACM SIGGRAPH/ Eurographics Symposium on 
Computer animation 

Full text available: f| pdfil.88 MBi Additional Information: full citation, abstract, references, citings, index 



terms 



Realistic animation of long human hair is difficult due to the number of hair strands and to 
the complexity of their interactions. Existing methods remain limited to smooth, uniform, 
and relatively simple hair motion. We present a powerful adaptive approach to modeling 
dynamic clustering behavior that characterizes complex long-hair motion. The Adaptive 
Wisp Tree (AWT) is a novel control structure that approximates the large-scale coherent 
motion of hair clusters as well as small-scaled variatio ... 

1 7 fiteflei^ 

Lucas Roh, Walid A. Najjar, A. P. Wim Bohm 

July 1993 Proceedings of the conference on Functional programming languages and 
computer architecture 

Full text available: *(I ]adf(993. 02 KB) Additional Information: full citation , references, citings, index terms 



T. H. Merrett 

January 1983 ACM SIGMOD Record, volume 13 issue 2 

Full text available: ' g) pdf/427.20 KB) Additional Information: full citation, abstract , references 

To join two relations efficiently, they must not only be clustered but mutually clustered. 
Sorting is the only known way to achieve mutual clustering. Once the relations are sorted, 
merging is the obvious way to implement the join. If the relations are known to be sorted 
appropriately, the most costly part of the process can be omitted. To know that a relation is 
sorted already, it is best to remember that we sorted it. Otherwise detecting that the 
relation is sorted requires inspection o ... 

Keywords: clustering, merging, mutual clustering, natural join, page-pair graphs, 
relational algebra, sorting 

19 FasLhjearcN^ 
David Eppstein 

January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: *g| pdfn.13 MB) Additional Information: full citation, references, citings, index terms 



Poster Sessions: Hierarchical ciustering of words 
Akira Ushioda 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: ^.pdg375 1 84 KB} Additional Information: fujl citatjon, abstract, references, cjtinss 

This paper describes a data-driven method for hierarchical clustering of words in which a 
large vocabulary of English words is clustered bottom-up, with respect to corpora ranging in 
size from 5 to 50 million words, using a greedy algorithm that tries to minimize average 
loss of mutual information of adjacent classes. The resulting hierarchical clusters of words 
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are then naturally transformed to a bit-string representation of (i.e. word bilts for) all the 
words in the vocabulary. I ... 
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